The VeteranTapes: Research Corpus, Fragment Processing Tool, and Enhanced Publications for the e-Humanities
نویسندگان
چکیده
Enhanced Publications are a new way to publish scientific and other results in an electronic article. The advantage of EPs is that the relation between the article and the underlying data facilitate the peer review process and other quality assessment activities. Due to the link between de publication and the research data the publication can be much richer than a paper edition permits. We present an example of EPs in which links are made to interview fragments that include transcripts, audio segments, annotations and metadata. EPs call for a new paradigm of research methodology in which digital persistent access to research data are a central issue. In this contribution we highlight 1. The research data as it is archived and curated, 2. the concept “enhanced publication” and its scientific value, 3. the “fragment fitter tool”, a language processing tool to facilitate the creation of EPs, 4. IPR issues related to the re-use of the interview data.
منابع مشابه
Corpus based coreference resolution for Farsi text
"Coreference resolution" or "finding all expressions that refer to the same entity" in a text, is one of the important requirements in natural language processing. Two words are coreference when both refer to a single entity in the text or the real world. So the main task of coreference resolution systems is to identify terms that refer to a unique entity. A coreference resolution tool could be...
متن کاملGutenTag: an NLP-driven Tool for Digital Humanities Research in the Project Gutenberg Corpus
This paper introduces a software tool, GutenTag, which is aimed at giving literary researchers direct access to NLP techniques for the analysis of texts in the Project Gutenberg corpus. We discuss several facets of the tool, including the handling of formatting and structure, the use and expansion of metadata which is used to identify relevant subcorpora of interest, and a general tagging frame...
متن کاملVocabulary Lists for EAP and Conversation Students
Despite the abundance of research investigating general and academic vocabularies and developing dozens of word lists, few studies have compared academic vocabulary with general service word lists such as conversation vocabulary. Many EAP researchers assume that university students need to know all the words in West’s (1953) General Service List (GSL) as a prerequisite to academic words (e.g., ...
متن کاملMetadiscourse Markers in a Corpus of Learner Language: The Case of Iranian EFL Learners
Different issues have been probed in learner corpus research since the late 1980s.However, taking the im- portance of meta discourse markers (MDMs) in signposting academic discourse, their use in Iranian EFL learners‟ academic essays is an area of research in need of a more serious analysis. Contributing to this line of investigation, this paper reports a corpus-based study of the use of MDMs i...
متن کاملLinguistic Issues in Language Technology – LiLT
This contribution investigates novel techniques for error detection in automatic semantic annotations, as an attempt to reconcile error-prone NLP processing with high quality standards required for empirical research in Digital Humanities. We demonstrate the state-of-the-art performance of semantic NLP systems on a corpus of ritual texts and report performance gains we obtain using domain adapt...
متن کامل